8 research outputs found

    A Network-Based Embedding Method for Drug-Target Interaction Prediction

    Get PDF

    Molecular footprint of Medawar's mutation accumulation process in mammalian aging

    Get PDF
    Medawar's mutation accumulation hypothesis explains aging by the declining force of natural selection with age: Slightly deleterious germline mutations expressed in old age can drift to fixation and thereby lead to aging-related phenotypes. Although widely cited, empirical evidence for this hypothesis has remained limited. Here, we test one of its predictions that genes relatively highly expressed in old adults should be under weaker purifying selection than genes relatively highly expressed in young adults. Combining 66 transcriptome datasets (including 16 tissues from five mammalian species) with sequence conservation estimates across mammals, here we report that the overall conservation level of expressed genes is lower at old age compared to young adulthood. This age-related decrease in transcriptome conservation (ADICT) is systematically observed in diverse mammalian tissues, including the brain, liver, lung, and artery, but not in others, most notably in the muscle and heart. Where observed, ADICT is driven partly by poorly conserved genes being up-regulated during aging. In general, the more often a gene is found up-regulated with age among tissues and species, the lower its evolutionary conservation. Poorly conserved and up-regulated genes have overlapping functional properties that include responses to age-associated tissue damage, such as apoptosis and inflammation. Meanwhile, these genes do not appear to be under positive selection. Hence, genes contributing to old age phenotypes are found to harbor an excess of slightly deleterious alleles, at least in certain tissues. This supports the notion that genetic drift shapes aging in multicellular organisms, consistent with Medawar's mutation accumulation hypothesis

    Beyin yaşlanması ve alzheimer hastalığının transkriptomik ağ analizi.

    No full text
    Multiple studies have investigated aging brain transcriptomes to identify for age-dependent expression changes and determine genes that may participate in age-related dysfunction. However, aging is a highly complex and heterogeneous process where multiple genes contribute at different levels depending on individuals’ environments and genotypes. Both this biological heterogeneity of aging, as well as technical biases and weaknesses inherent to transcriptome measurements, limit the information gained from a single data set. Here we propose using network analysis to reproducibly identify aging-related gene interactions shared across different datasets. We employ the prize-collecting Steiner forest algorithm to create aging networks on human brain transcriptome datasets. The algorithm calculates the optimal interaction set among aging-related genes within a protein-protein interaction (PPI) network, taking into consideration expression-age correlation coefficients of the most deferentially expressed genes with age, and the PPI confidence scores. This allows aging-related genes to interact either directly or through intermediate nodes. The intermediate nodes, in turn, can represent genes undetected in transcriptome data due to low light intensity, technical inefficiency of platforms, or aging-related molecular changes that do not involve mRNA abundance change, such as aging-related post-translational modifications. Using the predicted networks, we have performed network alignment of the reconstructed networks to test whether common interactions might be found in different tissues’ aging networks. In addition, we also extend the approach to compare molecular changes during aging and in Alzheimer’s Disease. We hypothesize that using network alignment will help highlight the most relevant gene clusters and pathways shared between the two processes. M.S. - Master of Scienc

    Network-based embedding methods for multi-omics data analysis

    No full text
    The development of high-throughput technologies has resulted in a significant increase in data, opening up new opportunities to study and better understand how biological systems dynamically interact. Network analysis, which is based on graph theory, can offer a framework for combining these high-throughput multi-omics datasets and examining interactions inside biological systems. Even though it is beneficial to analyze vast volumes of data within the networks, employing traditional statistical approaches on these large networks can be challenging. Network embedding methodologies can be used to effectively handle the complexity of large biological networks. Network embedding approaches can represent node connectivity in large-scale networks as low dimensional vectors, which minimizes the downstream analysis complexity. These lower-dimensional vectors can then be used in machine learning prediction tasks with a wide range of applications in computational biology and bioinformatics. In recent years, there has been considerable interest in the application of network embedding methods in biological studies. Following a comprehensive review of the uses of network embedding techniques for link prediction tasks in biological networks, this thesis aims to (1) investigate different approaches to the creation of embedding features from biological networks by taking into account the sparsity and incompleteness of biological networks and the trade-off between local and global structure properties in network embedding methods; (2) assess the use of these embeddings in supervised learning link prediction tasks, taking into account the imbalances in class distributions observed in biological networks; (3) investigate approaches to integrate network embedding approaches in order to obtain more comprehensive vector representations from networks. To achieve embeddings from the networks, two novel pipelines have been designed in the context of this thesis; Individual Network Embedding (INE) and Heterogeneous Network Embedding (HeNE). INE applies network embedding methods on each input network individually and then integrates the embeddings achieved from these different networks. In contrast, HeNE initially integrates networks and applies network embedding methods on an integrated network. The embeddings created by these methods have been used in supervised learning models to predict drug-target interaction (DTI), protein-disease interaction (PDI), and miRNA-disease interaction (MDI). The results show that in DTI and PDI, performance is slightly higher when the INE pipeline is applied. The average AUROC curves in DTI are 0.92 and 0.91 in INE and HeNE respectively. INE and HeNE are 0.94 and 0.93 in PDI. In contrast, HeNE is more advantageous in MDI where networks are sparse and network integration results in improved connectivity following the implementation of the HeNE pipeline. The average AUROC curves in MDI are 0.77 and 0.87 in INE and HeNE respectively. The latter part of this thesis focuses on developing ensemble learning methods. Ensemble learning is a framework to integrate several embedding techniques for use in prediction tasks. This framework aims to assess whether combined vector representations from multiple embedding methods offer complementary information with regard to network features and thus, better performance on prediction tasks. In this framework, different network embedding methods can be compared and the effect of sampling methods, feature selection methods, cross-validation type, and cross-validation parameters can be tested on the integrated embeddings. Here, the integration of embeddings is compared to different embedding methods individually in the prediction of DTIs, PDIs, and MDIs. The results show that embedding integration slightly outperforms some of the embedding methods individually, likely due to their ability to obtain more comprehensive information from networks and thus higher prediction performance in link prediction. Furthermore, the results show that eliminating some of the embedding methods from ensemble learning might change the prediction performance. Although variations in the results can be observed, the changes in performance caused by the elimination of these network embeddings are not significant. The studies conducted in this thesis contribute to examining the application of network embedding methods on biological networks to reduce the analysis complexity of large networks. The results show that the performance following the INE and HeNE pipelines is dependent on the task and networks that are used. In conclusion, in future studies, to find the approach with the highest performance both pipelines should be considered and tested. This thesis additionally concludes that the integration of embedding methods in link prediction tasks may result in the accumulation of more comprehensive network features, which improves link prediction performance

    Impaired inhibitory GABAergic synaptic transmission and transcription studied in single neurons by Patch-seq in Huntington's disease

    No full text
    All rights reserved.Transcriptional dysregulation in Huntington's disease (HD) causes functional deficits in striatal neurons. Here, we performed Patchsequencing (Patch-seq) in an in vitro HD model to investigate the effects of mutant Huntingtin (Htt) on synaptic transmission and gene transcription in single striatal neurons. We found that expression of mutant Htt decreased the synaptic output of striatal neurons in a cell autonomous fashion and identified a number of genes whose dysregulation was correlated with physiological deficiencies in mutant Htt neurons. In support of a pivotal role for epigenetic mechanisms in HD pathophysiology, we found that inhibiting histone deacetylase 1/3 activities rectified several functional and morphological deficits and alleviated the aberrant transcriptional profiles in mutant Htt neurons. With this study, we demonstrate that Patch-seq technology can be applied both to better understand molecular mechanisms underlying a complex neurological disease at the single-cell level and to provide a platform for screening for therapeutics for the disease

    Yeni nesil moleküler veri analizi yoluyla genom ve transkriptom evriminin incelenmesi

    No full text
    Tüm genom dizileme verisi, genom çapında veya ekzom çapında polimorfizm verisi, mikrodizin ve RNA-dizileme verisi, GC-MS metabolit verisi gibi geniş çaplı moleküler veri setlerinin hesaplamalı analizi yoluyla uzun zamandır biyologları meşgul eden çok sayıda sorunun cevaplanması bugün mümkün hale gelmiştir. Araştırma grubumuzda genom ve transkriptom evrimi üzerine şu soruları gelecek yıl içinde cevaplayamaya çalışacağız:- Primatlar arasında testis transkriptomu niye ve nasıl evrilmektedir? - Türler arasında transkriptom farkları arasında en anlamlı olanlar nasıl tespit edilir?- Genom çapında kısa nükleotit homopolimerleri oluşturan mutasyonların insan popülasyonu içinde hızlı yayılmasının sebepleri nelerdir? - Yaşlanma sırasında metabolit ve gen ifadesi değişimlerinin sebepleri nedir?- Anadolu insan popülasyonunda Neandertal karışımı başka popülasyonlardan farklı olabilir mi?- Anadolu’da geçmiş göç örüntüleri nelerdir?- Mesane kanserinde görülen senkronize tümörler akraba mıdır

    Somatic copy number variant load in neurons of healthy controls and Alzheimer’s disease patients

    Get PDF
    Abstract The possible role of somatic copy number variations (CNVs) in Alzheimer’s disease (AD) aetiology has been controversial. Although cytogenetic studies suggested increased CNV loads in AD brains, a recent single-cell whole-genome sequencing (scWGS) experiment, studying frontal cortex brain samples, found no such evidence. Here we readdressed this issue using low-coverage scWGS on pyramidal neurons dissected via both laser capture microdissection (LCM) and fluorescence activated cell sorting (FACS) across five brain regions: entorhinal cortex, temporal cortex, hippocampal CA1, hippocampal CA3, and the cerebellum. Among reliably detected somatic CNVs identified in 1301 cells obtained from the brains of 13 AD patients and 7 healthy controls, deletions were more frequent compared to duplications. Interestingly, we observed slightly higher frequencies of CNV events in cells from AD compared to similar numbers of cells from controls (4.1% vs. 1.4%, or 0.9% vs. 0.7%, using different filtering approaches), although the differences were not statistically significant. On the technical aspects, we observed that LCM-isolated cells show higher within-cell read depth variation compared to cells isolated with FACS. To reduce within-cell read depth variation, we proposed a principal component analysis-based denoising approach that significantly improves signal-to-noise ratios. Lastly, we showed that LCM-isolated neurons in AD harbour slightly more read depth variability than neurons of controls, which might be related to the reported hyperploid profiles of some AD-affected neurons

    The Demographic Development of the First Farmers in Anatolia

    Get PDF
    The archaeological documentation of the develop-ment of sedentary farming societies in Anatolia isnot yet mirrored by a genetic understanding of thehuman populations involved, in contrast to thespread of farming in Europe [1–3]. Sedentary farmingcommunities emerged in parts of the Fertile Crescentduring the tenth millennium and early ninth millen-nium calibrated (cal) BC and had appeared in centralAnatolia by 8300 cal BC [4]. Farming spread intowest Anatolia by the early seventh millennium calBC and quasi-synchronously into Europe, althoughthe timing and process of this movement remain un-clear. Using genome sequence data that we gener-ated from nine central Anatolian Neolithic individuals,we studied the transition period from early Aceramic(Pre-Pottery) to the later Pottery Neolithic, whenfarming expanded west of the Fertile Crescent. Wefind that genetic diversity in the earliest farmerswas conspicuously low, on a par with Europeanforaging groups. With the advent of the PotteryNeolithic, genetic variation within societies reachedlevels later found in early European farmers. Our re-sults confirm that the earliest Neolithic central Anato-lians belonged to the same gene pool as the firstNeolithic migrants spreading into Europe. Further,genetic affinities between later Anatolian farmersand fourth to third millennium BC Chalcolithic southEuropeans suggest an additional wave of Anatolianmigrants, after the initial Neolithic spread but beforethe Yamnaya-related migrations. We propose thatthe earliest farming societies demographicallyresembled foragers and that only after regionalgene flow and rising heterogeneity did the farmingpopulation expansions into Europe occur.WoSScopusPubMe
    corecore